Final Project Spring Christopher Rabeony ========================================================
Background Information: Spotify is a Swedish audio streaming platform that provides music and podcasts from record labels and media companies.
Introduction to my Project
My data source for this project will be extracted from the ‘Spotify Top 200’ weekly streaming data. The url for this website is https://spotifycharts.com/regional/.
Six Different Countries
Spotify not only includes streaming information from the United States, but from other countries as well. For this project we will include data from the United States, Argentina, Bolivia, the United Kingdom, Belgium, and Australia.
For a lot of our data extraction and cleaning for each of our countries I will be using functions to cut out a lot of unnecessary code.
Specify and read in the URL. Extract the table node and select the first table as a data frame.
USA <- get_tbl("https://spotifycharts.com/regional/us/weekly/latest")
head(USA, 3)
## Var.1 Var.2 Var.3
## 1 NA 1 NA
## 2 NA 2 NA
## 3 NA 3 NA
## Track
## 1 RAPSTAR\n by Polo G
## 2 Kiss Me More (feat. SZA)\n by Doja Cat
## 3 MONTERO (Call Me By Your Name)\n by Lil Nas X
## Streams
## 1 12,305,863
## 2 11,112,936
## 3 9,994,317
Because tables imported from webpages usually need cleaning up.
spotify_USA <- clean_df(USA)
spotify_USA$Code <- str_replace_all(spotify_USA$Code, "\\d{1,3}", "NA")
head(spotify_USA, 4)
## Code
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## Track
## 1 RAPSTAR
## 2 Kiss Me More (feat. SZA)
## 3 MONTERO (Call Me By Your Name)
## 4 Peaches (feat. Daniel Caesar & Giveon)
## Artist Streams
## 1 Polo G 12305863
## 2 Doja Cat 11112936
## 3 Lil Nas X 9994317
## 4 Justin Bieber 8697957
The data we extracted above lists the top 200 songs streamed for a given country. We are given the name of each song, the artist involved in its creation, the number of total streams, and its rank in the top 200. I created a column called “Code” which displays the continent that each our data table represents. For the USA, table we use “NA” (North America)
Now to include the information of the other five countries in the same format.
Argentina (South America)
head(spotify_AR, 2)
## Code
## 1 SA
## 2 SA
## Track
## 1 Fiel
## 2 L-Gante: Bzrp Music Sessions, Vol.38
## Artist Streams
## 1 Los Legendarios, Wisin, Jhay Cortez 2118367
## 2 Bizarrap, L-Gante 1932223
Bolivia (South America)
head(spotify_BO, 2)
## Code
## 1 SA
## 2 SA
## Track
## 1 Botella Tras Botella
## 2 Pareja Del Año
## Artist Streams
## 1 Gera MX, Christian Nodal 403616
## 2 Sebastian Yatra, Myke Towers 261131
======================================================== United Kingdom (Europe)
head(spotify_UK, 2)
## Code
## 1 EU
## 2 EU
## Track
## 1 MONTERO (Call Me By Your Name)
## 2 Peaches (feat. Daniel Caesar & Giveon)
## Artist Streams
## 1 Lil Nas X 3460160
## 2 Justin Bieber 2552651
BE <- get_tbl("https://spotifycharts.com/regional/be/weekly/latest")
spotify_BE <- clean_df(BE)
spotify_BE$Code <- str_replace_all(spotify_BE$Code, "\\d{1,3}", "EU")
Belgium (Europe)
head(spotify_BE, 2)
## Code
## 1 EU
## 2 EU
## Track
## 1 MONTERO (Call Me By Your Name)
## 2 Friday (feat. Mufasa & Hypeman) - Dopamine Re-Edit
## Artist Streams
## 1 Lil Nas X 519558
## 2 Riton, Nightcrawlers 398628
======================================================== Finally Australia (Australia)
head(spotify_AU, 4)
## Code
## 1 AUS
## 2 AUS
## 3 AUS
## 4 AUS
## Track
## 1 MONTERO (Call Me By Your Name)
## 2 Peaches (feat. Daniel Caesar & Giveon)
## 3 Heat Waves
## 4 Kiss Me More (feat. SZA)
## Artist Streams
## 1 Lil Nas X 1626491
## 2 Justin Bieber 1513052
## 3 Glass Animals 1486195
## 4 Doja Cat 1443893
Next Step: Data Manipulation and Representation
With our datasets now imported and cleaned we can now manipulate our data to reveal new information.
spotify_Global <- bind_rows(spotify_EU, spotify_NA, spotify_AUS, spotify_SA) %>%
arrange(desc(Streams)) %>%
group_by(Code)
spotify_Global
## # A tibble: 993 x 4
## # Groups: Code [4]
## Code Track Artist Streams
## <chr> <chr> <chr> <dbl>
## 1 NA "RAPSTAR … Polo G 1.23e7
## 2 NA "Kiss Me More (feat. SZA) … Doja Cat 1.11e7
## 3 NA "MONTERO (Call Me By Your Name) … Lil Nas X 9.99e6
## 4 NA "Peaches (feat. Daniel Caesar & Giveon)… Justin Bieber 8.70e6
## 5 NA "Save Your Tears (with Ariana Grande) (… The Weeknd 7.80e6
## 6 NA "Levitating (feat. DaBaby) … Dua Lipa 7.68e6
## 7 NA "deja vu … Olivia Rodrigo 7.40e6
## 8 NA "Astronaut In The Ocean … Masked Wolf 6.28e6
## 9 NA "Heartbreak Anniversary … Giveon 5.92e6
## 10 NA "Leave The Door Open … Bruno Mars, Anderson … 5.69e6
## # … with 983 more rows
One data frame can provide a plethora of information and can provide answers a lot of questions.
continentStreams
Why is streaming so much higher in North America compared to Europe, even though Spotify was founded in Sweden, and released first in the United Kingdom?
Let’s visualize the distribution of total streams for the last week.
streamDist
## # A tibble: 4 x 2
## Code totalStreams
## <chr> <dbl>
## 1 NA 485414548
## 2 EU 153489363
## 3 SA 95146044
## 4 AUS 77250137
Let’s create a pie chart to view how each coninent’s streaming numbers compare to the other.
How are the total number of global streams distributed to each continent?
Stream_circle
spotify_topSongs <- spotify_Global %>%
arrange(desc(Streams)) %>%
group_by(Code) %>% slice(1:3)
eachCountry
As we can see, there’s a lot of crossover when it comes to artists and their international audiences.
Now that we have looked at streaming information for each continent closely, let’s look at our overall Global information.
spotify_globalSongs
## # A tibble: 699 x 3
## # Groups: Track [693]
## Track Artist Streams
## <chr> <chr> <dbl>
## 1 "MONTERO (Call Me By Your Name) … Lil Nas X 1.64e7
## 2 "RAPSTAR … Polo G 1.64e7
## 3 "Kiss Me More (feat. SZA) … Doja Cat 1.53e7
## 4 "Peaches (feat. Daniel Caesar & Giveon) … Justin Bieber 1.42e7
## 5 "Levitating (feat. DaBaby) … Dua Lipa 1.10e7
## 6 "deja vu … Olivia Rodrigo 1.07e7
## 7 "Save Your Tears (with Ariana Grande) (Remix… The Weeknd 1.05e7
## 8 "Astronaut In The Ocean … Masked Wolf 9.40e6
## 9 "Leave The Door Open … Bruno Mars, Anderson .… 8.20e6
## 10 "Heartbreak Anniversary … Giveon 8.12e6
## # … with 689 more rows
Lets create a graphic representation of this data frame above. Graphing the overall top 200 songs streamed on the Spotify website.
popularSongs
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Blues is 9
## Returning the palette you asked for with that many colors
Another method I would like to integrate into my project is the use of the Spotify Developer Tools Web API. The method to retrieving an API key is free, and simple once an account is made.
library(spotifyr)
The spotifyr package pulls a variety of audio features from Spotify’s Web Api. Once we obtain the web key, and authorization we can retrieve a variety of information in seconds.
spotify_client_id <- source("/Users/chris/Documents/DataWranglingHusbandry/DataWranglingFinalProject/api-keysSpotify.R")
Sys.setenv(SPOTIFY_CLIENT_ID = api.key.spotify)
Sys.setenv(SPOTIFY_CLIENT_SECRET = api.spotify.clientID)
access_token <- get_spotify_access_token()
Now let’s go back to the data that contained the most streamed songs globally.
head(spotify_globalSongs)
## # A tibble: 6 x 3
## # Groups: Track [6]
## Track Artist Streams
## <chr> <chr> <dbl>
## 1 "MONTERO (Call Me By Your Name) … Lil Nas X 1.64e7
## 2 "RAPSTAR … Polo G 1.64e7
## 3 "Kiss Me More (feat. SZA) … Doja Cat 1.53e7
## 4 "Peaches (feat. Daniel Caesar & Giveon) … Justin Bieb… 1.42e7
## 5 "Levitating (feat. DaBaby) … Dua Lipa 1.10e7
## 6 "deja vu … Olivia Rodr… 1.07e7
Lets take the top 5 artists on this list and find out if there’s something that in their music that make their songs the most popular in the world.
We will first take the top 50 most streamed songs in the world.
Import our data using the ‘search_spotify’ function to retrieve more detailed information for each track.
artist_audio_features <- map_df(spotifyTop50, function(artist) {
search_spotify(artist, "track") %>%
mutate(artist_name = artist)
})
spotifytopInformation <- spotifyFilter1 %>% group_by(artist_name) %>% arrange(desc(popularity)) %>% slice(1)
head(spotifytopInformation)
## # A tibble: 6 x 5
## # Groups: artist_name [6]
## artist_name id name popularity album.release_d…
## <chr> <chr> <chr> <int> <date>
## 1 24kGoldn 4jPy3l0RUw… Mood (feat. iann… 90 2021-03-26
## 2 Ariana Grande 37BZB0z9T8… Save Your Tears … 90 2021-04-23
## 3 AURORA 3Z0oQ8r78O… Into the Unknown 76 2019-11-15
## 4 Bad Bunny, Jhay Cor… 47EiUVwUp4… DÁKITI 90 2020-10-30
## 5 Billie Eilish 54bFM56PmE… Therefore I Am 88 2020-11-12
## 6 Bruno Mars, Anderso… 7MAibcTli4… Leave The Door O… 96 2021-03-05
Above is our most popular songs, by our top artists.
spotifytrackInfo <- spotifytopInformation$id
spotifytrackFeatures <- get_track_audio_features(spotifytrackInfo)
spotifytrackAnalysis <- get_tracks(spotifytrackInfo) %>% select(9,7,10)
head(trackInformation, 2)
## name id
## 1 Mood (feat. iann dior) 4jPy3l0RUwlUI9T5XHBW2m
## 2 Save Your Tears (with Ariana Grande) (Remix) 37BZB0z9T8Xu7U3e65qxFy
## popularity danceability energy key loudness mode speechiness acousticness
## 1 90 0.701 0.716 7 -3.671 0 0.0361 0.1740
## 2 90 0.650 0.825 0 -4.645 1 0.0325 0.0215
## instrumentalness liveness valence tempo type
## 1 0.00e+00 0.3240 0.732 91.007 audio_features
## 2 2.44e-05 0.0936 0.593 118.091 audio_features
## uri
## 1 spotify:track:4jPy3l0RUwlUI9T5XHBW2m
## 2 spotify:track:37BZB0z9T8Xu7U3e65qxFy
## track_href
## 1 https://api.spotify.com/v1/tracks/4jPy3l0RUwlUI9T5XHBW2m
## 2 https://api.spotify.com/v1/tracks/37BZB0z9T8Xu7U3e65qxFy
## analysis_url duration_ms
## 1 https://api.spotify.com/v1/audio-analysis/4jPy3l0RUwlUI9T5XHBW2m 140533
## 2 https://api.spotify.com/v1/audio-analysis/37BZB0z9T8Xu7U3e65qxFy 191014
## time_signature
## 1 4
## 2 4
The Spotify for Developers App does a great job analyzing the musical characteristics for each and every song. These features inclue a songs, “danceability”, “tempo”, “liveliness”, “energy”, and its use of “acoustics”
I want to create a linear model that might be able to find any strong correlation between these key characteristics and how these musical tracks will be received by the general public.
topSongs.lm <- lm(formula = popularity ~ acousticness + liveness + energy + valence + loudness + tempo, data = trackInformation)
summary(topSongs.lm)
##
## Call:
## lm(formula = popularity ~ acousticness + liveness + energy +
## valence + loudness + tempo, data = trackInformation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.3924 -2.5995 0.3427 2.9475 9.4733
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.42218 6.41954 14.085 < 2e-16 ***
## acousticness 2.60360 3.42400 0.760 0.45021
## liveness 17.17105 5.60825 3.062 0.00338 **
## energy 6.54156 6.52208 1.003 0.32018
## valence 0.61651 2.86165 0.215 0.83021
## loudness 0.28017 0.37648 0.744 0.45988
## tempo -0.05539 0.02436 -2.273 0.02685 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.639 on 56 degrees of freedom
## Multiple R-squared: 0.3242, Adjusted R-squared: 0.2518
## F-statistic: 4.478 on 6 and 56 DF, p-value: 0.0009048
Based on the information I’ve presented. There really isn’t a conclusion that can be properly drawn. There doesn’t seem to be any correlation between the popularity of any given song, and its features.
trackInfo <- gather(trackInformation, 'danceability':'tempo', key = 'characteristic', value = 'value')
ggplot(trackInfo, aes(value, popularity)) + geom_point() + facet_wrap(~characteristic, ncol = 5, scales = "free_x")
head(spotify_globalArtists, 10)
## # A tibble: 10 x 2
## Artist Streams
## <chr> <dbl>
## 1 Justin Bieber 29802117
## 2 Polo G 24672052
## 3 The Weeknd 23630771
## 4 Doja Cat 21550705
## 5 Dua Lipa 19356819
## 6 Olivia Rodrigo 18246898
## 7 Juice WRLD 18018010
## 8 Lil Nas X 17989509
## 9 Drake 17702858
## 10 Pop Smoke 13205093
mostStreamed
While an artist might have a massive amount of streams, this doesn’t mean they have a successful music career overall. Having multiple songs in the top 200 could be seen as higher benchmark for success.
Counting how many songs an artist has in the global top 200.
spotify_globalAppearances
## # A tibble: 514 x 2
## # Groups: Artist [514]
## Artist n
## <chr> <int>
## 1 Damso 15
## 2 Christian Nodal 10
## 3 Duki 8
## 4 Morgan Wallen 8
## 5 Juice WRLD 7
## 6 Bad Bunny 6
## 7 Ed Sheeran 6
## 8 Justin Bieber 6
## 9 AJ Tracey 5
## 10 Camilo 5
## # … with 504 more rows
spotify_globalAppearances %>%
with(wordcloud(words = Artist, n, max.words = 30, random.order = FALSE, colors = brewer.pal(8, "Dark2")))
Take note that Lil Nas X the creator of “Old Town Road”, is not even in the top 10 for amount of songs he has in the top 200. Even though he has dominated the streaming numbers. We call this type of phenomenon One-Hit Wonder.
Since we couldn’t figure the characteristics that create a popular song, then let’s find what creates a popular artist.
Using the information we’ve received from our wordCloud let’s how our artists use their music to capture their listeners.
Grabbing data on our Artists
Let’s use the ‘spotifyr’ package once again to retrieve information on discography for some of the more popular artists.
For example:
Billie_Eilish <- getArtist_Information('billie eilish')
## Warning: `mutate_()` was deprecated in dplyr 0.7.0.
## Please use `mutate()` instead.
## See vignette('programming') for more help
head(Billie_Eilish, 3)
## artist_name album_type album_release_date danceability energy key loudness
## 1 Billie Eilish album 2019-03-29 0.000 0.278 1 -21.630
## 2 Billie Eilish album 2019-03-29 0.701 0.425 7 -10.965
## 3 Billie Eilish album 2019-03-29 0.521 0.125 9 -17.832
## mode speechiness acousticness instrumentalness liveness valence tempo
## 1 1 0.000 0.768 0.00000 0.669 0.0000 0.000
## 2 1 0.375 0.328 0.13000 0.100 0.5620 135.128
## 3 1 0.239 0.751 0.00207 0.265 0.0528 111.554
## time_signature duration_ms track_name track_number type
## 1 0 13578 !!!!!!! 1 track
## 2 4 194087 bad guy 2 track
## 3 4 243725 xanny 3 track
## album_name key_name mode_name key_mode
## 1 WHEN WE ALL FALL ASLEEP, WHERE DO WE GO? C# major C# major
## 2 WHEN WE ALL FALL ASLEEP, WHERE DO WE GO? G major G major
## 3 WHEN WE ALL FALL ASLEEP, WHERE DO WE GO? A major A major
BillieEilish_Energy <- getEnergy_graph(Billie_Eilish)
BillieEilish_Energy
Let’s create a graph that will compare the energy (intensity) and valence (emotion) for each of our artists.
PNL_Energy <- getEnergy_graph(PNL)
PNL_Energy
SchoolBoy_Energy <- getEnergy_graph(SchoolBoy_Q)
SchoolBoy_Energy
Sebastian_Energy <- getEnergy_graph(Sebastian_Yatra)
Sebastian_Energy
PostMalone_Energy <- getEnergy_graph(Post_Malone)
PostMalone_Energy
Khalid_Energy <- getEnergy_graph(Khalid)
Khalid_Energy
We can somewhat make a conclusion that some of the most popular music in this day and age, are low intensity sounds, with very dark material.
However the biggest point I want to make is: Music tastes aren’t objective.
A lot of our enjoyment in music comes from our socioeconomic backgrounds, how our environment has influenced us, and what’s readily available for us to listen to.